The GDP by Region (in million) data obtained from Statistics New Zealand is annually therefore it needs to be converted into monthly data. The process of converting the data into monthly is split into two parts : interpolate the annual data into quarterly and interpolate the converted quarterly data into monthly. The formula used for coverting the annual data into quarterly data is as follow:
\[\text{GDP}_{year,quarter} = \frac{\text{GDP}_{year + 1}}{4} + \frac{\Delta_{year + 1}\times \text{nb}}{10}, \hspace{.3in} \Delta_{year + 1} = \text{GDP}_{year + 1} - \text{GDP}_{year}\] where \(\text{nb} = 1,2,3,4\) corresponding to \(quarter = Q1,Q2,Q3,Q4\), respectively for each \(year = 2008,...,2018\).
The quarterly data can then be converted into monthly data using cubic interpolation (spline() function).
Figure 1. Monthly seasonal time series data
Family A frequently bought oranges in January, March, May, August and November but did not do so during the other months.
The aim is to forecast the 2020 bought oranges for Family A. We can model this data using a regression model with a linear trend and monthly dummy variables, \[y_t = \beta_o + \beta_1t + \beta_2m_{2,t} + \beta_3m_{3,t} + \dots + \beta_{12}m_{12,t} + \epsilon_t\] where \(\beta_1\) is the trend predictor and \(\beta_2m_{2,t},\dots,\beta_{12}m_{12,t}\) are the seasonal dummy predictors for 12 months. Notice that only eleven dummy variables are needed to code twelve categories. That is because the first category (in this case January) is captured by the intercept, and is specified when the dummy variables are all set to zero. In R the trend predictor is coded as trend and the seasonal dummy predictor is coded as season.
sumVal <- tapply(training, cycle(training), FUN = sum)
fit.reg <- tslm(training~ trend + relevel(season, ref = which.min(sumVal)))
names(fit.reg$coefficients)[3:13] <- paste("month", substr(names(fit.reg$coefficients)[3:13],41,42))
summary(fit.reg)
##
## Call:
## tslm(formula = training ~ trend + relevel(season, ref = which.min(sumVal)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.000 -3.917 0.000 8.083 26.667
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.75000 10.91432 1.901 0.0699 .
## trend -0.19444 0.27346 -0.711 0.4842
## month 1 10.11111 13.30698 0.760 0.4551
## month 2 15.63889 13.26477 1.179 0.2505
## month 3 1.16667 13.22807 0.088 0.9305
## month 4 17.36111 13.19694 1.316 0.2013
## month 5 -0.11111 13.17142 -0.008 0.9933
## month 6 5.75000 13.15153 0.437 0.6660
## month 7 -0.05556 13.13731 -0.004 0.9967
## month 8 12.80556 13.12877 0.975 0.3395
## month 10 27.19444 13.12877 2.071 0.0497 *
## month 11 11.38889 13.13731 0.867 0.3949
## month 12 11.91667 13.15153 0.906 0.3743
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.08 on 23 degrees of freedom
## Multiple R-squared: 0.2923, Adjusted R-squared: -0.07697
## F-statistic: 0.7915 on 12 and 23 DF, p-value: 0.6547
The p-value reported is the probability of the estimated \(\beta\) coefficient being as large as it is if there was no real relationship between the response variable and the corresponding predictor. In this case, no months is shown to have an effect on the number of oranges bought, implying seasonality is not significant.
Figure 2. Forecast of the predicted regression model
Figure 3. Time plot of Family A tre and predicted bought oranges
##
## Pearson's product-moment correlation
##
## data: training and fitted(fit.reg)
## t = 3.7472, df = 34, p-value = 0.0006641
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2578955 0.7380690
## sample estimates:
## cor
## 0.5406252
Figure 2 plots the actual (training) data versus the fitted data. If the predictions are close to the actual values,\(\text{R}^2\) is expected to be close to 1. The Pearson correlation test shows that the correlation between these variables is \(\text{R}^2\) = 0.3225668. In this case the model doesn’t seem like an appropriate choice for fitting the data of Family A as it only explains 32% of the variation in the data.
Figure 4. Time plot of Tasman visitors and predicted Tasman visitors
Note: A plot of the residuals against the fitted values should show no pattern. If a pattern is observed, there may be “heteroscedasticity” in the errors which means that the variance of the residuals may not be constant.
The residuals based on the residual plots are not showing any obvious patterns or trends, indicating constant variance.
Figure 5. Residuals of regression model including trend and seasonal dummy predictors
The CV() (short for cross-validation statistic) function computes the CV, AIC, AICc and BIC measures of predictive accuracy for a linear model. For these measures, the model fits the data better with the lowest value of CV, AIC, AICc and BIC; the model fits the data better with the highest value for Adjusted \(\text{R}^2\). Note: This is useful when studying the effect of each predictor, but is not particularly useful for forecasting.
CV(fit.reg)
## CV AIC AICc BIC AdjR2
## 405.81432099 211.83827133 231.83827133 234.00753646 -0.07697186
This function’s purpose is to select the best predictors to use in a regression model when there are multiple predictors. Here the result shows that the seasonal dummy variables are not required in the model.
The stlf() function decomposes the time series using STL, forecast the seasonally adjusted data (data without seasonality) and return the ‘reseasonalized’ forecasts (forecasts that take the seasonality into account). If the method argument is not specified, the function will use the ETS approach applied to the seasonally adjusted series.
STL <- stlf(training, biasadj = TRUE, h = 12)
STL$model$aic
## [1] 318.9962
The nnetar() function in the forecast package for R fits a Neural Network Model (NNAR) to a time series with lagged values of the time series as inputs (and possibly some other exogenous inputs). It is therefore a non-linear autogressive model, allowing complex non-linear relationships between the response variable and its predictors. The NNAR model for seasonal data has the form: \[NNAR(p,P,k)[m]\]
set.seed(2015)
nnetar.model <- nnetar(training)
NNAR <- forecast(nnetar.model, h = 12, biasadj = TRUE)
nnetar.model
## Series: training
## Model: NNAR(1,1,2)[12]
## Call: nnetar(y = training)
##
## Average of 20 networks, each of which is
## a 2-2-1 network with 9 weights
## options were - linear output units
##
## sigma^2 estimated as 65.04
Since NNAR models usually (and in this case) have no underlying statistical model, calculating an AIC/BIC does not make sense here. A possible solution to select the best model is to fit various models to 90% of the data and use these models to forecast the remaining 10%, i.e., use a holdout sample. Choose the model that performs best on the holdout sample (“best” will depend on the error measure(s)). Refit this model based on the entire sample.
### 3. TBATS model Both the NNAR and TBATS models are mainly used for series exhibiting multiple complex seasonalities. TBATS is short for Exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components:
The TBATS model can be fitted using the tbats() command in the forecast package for R. The forecast function when running the TBATS model only returns the AIC value hence in this section we are comparing models using AIC. However AIC is not valid for neither NNAR or Combination (the combination is not really model but merely an average of all the methods’forecasts) thus these methods are going to be compared using RMSE.
TBATS <- forecast(tbats(training, biasadj = TRUE, use.box.cox = FALSE), h = 12)
TBATS$model$AIC
## [1] 329.7746
An easy way to improve forecast accuracy is to use several different methods on the same time series, and to average the resulting forecasts. The forecasts used in this example are from the following models: ETS, ARIMA, STL-ETS, NNAR, and TBATS.
Combination <- (ARIMA[["mean"]] + ETS[["mean"]] + STL[["mean"]] + NNAR[["mean"]] +
TBATS[["mean"]])/5
Though the Combination models performance is particularly well in this series, the NNAR model’s performance is shown to have the smallest RMSE error value.
## ARIMA ETS STL-ETS NNAR TBATS Combination
## RMSE 18.5967 17.88111 23.48746 14.3574 18.23517 17.81031
| year2018 | TrueData | ARIMA | ETS | Seasonal | STL | NNAR | TBATS | Combination |
|---|---|---|---|---|---|---|---|---|
| Jan | 0 | 33 | 26 | 23 | 26 | 36 | 29 | 30 |
| Feb | 19 | 23 | 26 | 28 | 32 | 9 | 29 | 24 |
| Mar | 47 | 27 | 26 | 14 | 17 | 37 | 29 | 27 |
| Apr | 2 | 26 | 26 | 30 | 34 | 9 | 29 | 25 |
| May | 46 | 26 | 26 | 12 | 16 | 35 | 29 | 26 |
| June | 15 | 26 | 26 | 18 | 22 | 15 | 29 | 24 |
| Jul | 49 | 26 | 26 | 12 | 17 | 28 | 29 | 25 |
| Aug | 3 | 26 | 26 | 25 | 30 | 18 | 29 | 26 |
| Sep | 33 | 26 | 26 | 11 | 17 | 29 | 29 | 25 |
| Oct | 22 | 26 | 26 | 38 | 44 | 16 | 29 | 28 |
| Nov | 47 | 26 | 26 | 22 | 28 | 36 | 29 | 29 |
| Dec | 25 | 26 | 26 | 23 | 29 | 21 | 29 | 26 |
Overall, the best model for the non-transformed data is the ARIMA model, ARIMA(1,0,0)(0,1,0)[12] and the best model for the transformed data is the Combination model. This statement is roughly based on the difference between the model’s predicted values and the true value of the original data for the Tasman Glacier data.
| year2018 | TrueData | ARIMA | ETS | Seasonal | STL | NNAR | TBATS | Combination |
|---|---|---|---|---|---|---|---|---|
| Jan | 0 | 33 | 26 | 23 | 26 | 36 | 29 | 30 |
| Feb | 19 | 23 | 26 | 28 | 32 | 9 | 29 | 24 |
| Mar | 47 | 27 | 26 | 14 | 17 | 37 | 29 | 27 |
| Apr | 2 | 26 | 26 | 30 | 34 | 9 | 29 | 25 |
| May | 46 | 26 | 26 | 12 | 16 | 35 | 29 | 26 |
| June | 15 | 26 | 26 | 18 | 22 | 15 | 29 | 24 |
| Jul | 49 | 26 | 26 | 12 | 17 | 28 | 29 | 25 |
| Aug | 3 | 26 | 26 | 25 | 30 | 18 | 29 | 26 |
| Sep | 33 | 26 | 26 | 11 | 17 | 29 | 29 | 25 |
| Oct | 22 | 26 | 26 | 38 | 44 | 16 | 29 | 28 |
| Nov | 47 | 26 | 26 | 22 | 28 | 36 | 29 | 29 |
| Dec | 25 | 26 | 26 | 23 | 29 | 21 | 29 | 26 |
https://www.stats.govt.nz/information-releases/regional-gross-domestic-product-year-ended-march-2018
https://robjhyndman.com/hyndsight/nnetar-prediction-intervals/
Page built on: 📆 2020-07-10 ‒ 🕢 12:40:43